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ABSTRACT 


While many works have been devoted to service matchmak- 
ing and modeling nonfunctional properties, the problem of 
matching service requests to offers in an optimal way has 
not yet been extensively studied. In this paper we formalize 
three kinds of optimal service selection problems, based on 
different criteria. Then we study their complexity and im- 
plement solutions. We prove that one-time costs make the 
optimal selection problem computationally hard; in the ab- 
sence of these costs the problem can be solved in polynomial 
time. We designed and implemented both exact and heuris- 
tic (suboptimal) algorithms for the hard case, and carried 
out a preliminary experimental evaluation with interesting 
results. 


Categories and Subject Descriptors 


H.3.5 [Online Information Services]: Web-based Ser- 
vices; F.2.2 [Nonnumerical Algorithms and Problems|: 
Computations on Discrete Structures 


General Terms 
Algorithms, Experimentation, Theory 


Keywords 


Service selection problem, Automatic service composition, 
Service matchmaking, Nonfunctional properties 


INTRODUCTION 


There exists an increasing body of work on automated 
service selection, based on criteria such as quality of ser- 
vice (QoS), trust, cost, etc. Some works focus on service 
matchmaking [26, 20, 2, 19, 28, 11, 3, 4, 21, 12, 13], that 
is, a process that given a service request returns the set 
of available services that can be used to fulfill that request 
(offers may be ranked according to their similarity to the re- 
quest). Some other papers focus on modelling nonfunctional 
properties such as the above criteria, that induce preference 
orderings on the available services [16, 5, 17]. 

However, no paper tackles in depth the optimization prob- 
lem that follows matchmaking and nonfunctional property 
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evaluation: What is the best way of binding each service 
request to a matching service? The problem may be non- 
trivial if the optimization involves multiple service requests 
at once. Consider for example composite services; they can 
be modelled as workflows [6, 12], where each activity poten- 
tially corresponds to a different service. In this framework, 
the decision problem consists in finding an optimal match- 
ing (w.r.t. the adopted criteria) between the set of activities 
occurring in the workflow and the set of available services 
that can be used to carry out those activities. 

In this paper we consider optimal service selection based 
on a given set of service requests (such as the activities oc- 
curring in a workflow), a set of service offers (the available 
services), the result of the matchmaking process (that asso- 
ciates each request to the set of offers that can satisfy it), 
and a numeric preference measure. Numeric measures are 
well-suited to a number of preference criteria of practical in- 
terest, based on costs of various sorts, as well as bandwidth, 
trust [1, 30, 24, 29], and other QoS criteria. Moreover, dif- 
ferent criteria can often be merged into a single numerical 
value [5]. 

Preferences and costs may be associated to services, ser- 
vice invocations, or both, as illustrated by the following ex- 
amples: 


e Trust is often associated to services, not service invo- 
cations. User preferences driven by privacy protection 
and security usually refer to services, independently of 
the specific call. 


e However, an information service may be trusted on 
some queries and not on others; in this case trust is 
associated to individual invocations. 


e Some services have an activation cost or a registra- 
tion cost, to be paid only the first time the service is 
invoked, or before the first use. Such costs are associ- 
ated to the services and do not depend on the number 
of calls nor on their nature. 


e Other services work on a pay-per-use basis (such as 
paper downloads from a digital library and other elec- 
tronic purchases). In this case, costs and preferences 
may depend on the specific request and are associated 
to each service invocation. Some services have both a 
per-use cost and an activation cost (such as telephone 
providers). 


e Also bandwidth and transmission speed may vary across 
different service calls. For example a service may be 


faster at certain times of the day. Another example is 
given by connection costs that depend on the duration 
of each particular call. 


In this paper we shall contribute to the understanding of 
the service selection problem (SSP, for short) by formalizing 
and studying three classes of SSP problems where selection 
is based on costs and on two different QoS-like criteria, re- 
spectively. For simplicity, in this paper we assume that costs 
and preferences are totally ordered and static (i.e., time in- 
dependent); partially ordered and dynamic nonfunctional 
properties will be dealt with in a forthcoming paper. 

We shall prove that in general—and despite the aforemen- 
tioned simplifying assumptions—the optimal service selec- 
tion problem is harder than NP (unless the polynomial hi- 
erarchy collapses). More precisely, some SSPs are in FPNP, 
like many famous hard optimization problems, while check- 
ing whether the optimal cost equals a given constant K is 
DP-complete. We shall identify practical cases where the 
problem can be solved in polynomial time. In particular, we 
show that the high computational complexity of the service 
selection problem is caused by the one-time costs associated 
to service offers (e.g., initialization and registration costs). 
In the absence of one-time costs, the optimal selection prob- 
lem can be solved in polynomial time by applying a greedy 
approach. Finally, we shall illustrate the results of an ex- 
perimental evaluation of both exact and heuristic algorithms 
over different classes of problem instances. 

The paper is organized as follows. In Section 2 we recall 
the definition of the complexity classes needed in this paper. 
In Section 3 the service selection problems are formalized. 
Section 4 contains the complexity results and the algorithms 
for the first class of SSPs (based on cost-like criteria), and 
reports the experimental results for these algorithms. Sec- 
tion 5 illustrates the complexity results and the algorithms 
for the remaining two classes of SSPs (based on QoS-like 
criteria). Section 6 concludes the paper with a discussion 
of the results and a list of interesting directions for future 
work. 


2. PRELIMINARIES ON COMPLEXITY 


We assume the reader to be familiar with the basics of 
computational complexity. We refer to [22] for more details. 

The class DP is a class of decision problems containing 
NP. DP can be defined as the class of all languages £ such 
that L = Lı N L2, for some Lı in NP and some £2 in co-NP. 
If Lı and L2 are complete for NP and co-NP, respectively, 
then £ is complete for DP. 

The class FPN? is the class of all function problems (i.e. 
problems that compute a value, not only a yes-no answer) 
that can be solved in polynomial time by a deterministic 
Turing machine with an oracle for NP. 

Many standard optimization problems are complete for 
FPN? For example, the Traveling Salesman Problem and 
Max-weight SAT are FPN?-complete [22, Chapter 17.1]. 


3. PROBLEM FORMALIZATION 


The instances of the service selection problems (SSP) ad- 
dressed in this paper are tuples (R, O, M, c, k} where: 


e R= {1,2,...,m} is a nonempty set of service requests; 


e O = {1,2,...,n} is a nonempty set of service offers; 
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e MC Rx Oisa matchi 
such that 


ng between requests and offers 


Vr € R, ds € O, (r,s) € M (1) 


(intuitively, if some request cannot be satisfied then 
the decision phase is never reached); 


e c: O — Q is a function that assigns a cost or quality 
measure cs to each offer s € O; 


e k: M — Qis a function that assigns a cost or quality 
measure krs € Q to each pair (r,s) € M (that is, to 
each possible service call). 


The goal is finding a binding between requests and offers, 
compatible with the given matching and optimal w.r.t. the 
preferences associated to services and invocations. 

Formally, a binding for (R,O, M,c, k} is a total function 
b: R — O such that b C M.' Condition (1) on M ensures 
that a binding always exists. 

As anticipated in the introduction, in this paper the opti- 
mality of bindings will be evaluated against different objec- 
tive functions. 

The first objective function, denoted by C», is appropri- 
ate for criteria based on totally ordered costs (money, time, 
etc.)? The overall cost of a binding is obtained by summing 
up the costs of all the calls specified in the binding, plus the 
one-time costs associated to the called services (e.g., initial- 
ization and registration costs). More precisely, let 


b[R] = {s € O | Fr € R. b(r) = s} 


denote the range of b (informally speaking, b[R] is the set 
of services “used” by b); then the total cost of binding b is 


given by 
Co = XO krot) a 5 Cs. 


reR s€b[R] 


(2) 


The second objective function, denoted by Q+, is appro- 
priate for many QoS-like criteria. Suppose that the aim of 
the optimization problem, in this case, is maximizing simul- 
taneously the quality of each requested service. Then the 
overall quality of a binding b can be modelled by summing 
up the qualities of each selected request-offer match: 


Qe = 5 f (ky b(r)> Co(r)) : 


reR 


(3) 


Here f : Q? — Q computes the quality of the solution pro- 
vided by the selected service b(r) to request r by appropri- 
ately combining the measure kr (r) associated to the service 
call and the measure c,,,) associated to the service. We as- 
sume only that f can be computed in polynomial time (w.r.t. 
the given instance), because different applications may re- 
quire different functions f. 

For example, suppose r is satisfied by invoking b(r) via a 
network connection. Packet rate is influenced both by the 
server’s speed and by the bandwidth allowed by the inter- 
mediate routers; the lowest rate determines the overall rate 
for the connection. Suppose the values kj; measures the 


'This inclusion means that for all requests r € R, (r,b(r)) € 
M. 

?Totally ordered costs are typically appropriate for uniform 
costs, and for multi-dimensional costs with a total preference 
over dimensions (e.g., money over time, and so on). 


packet rate allowed by the connection between i and j, and 
the values c; measure the packet rate of the servers; then it 
is appropriate to set f = min. 

For another example, suppose the values k;; measure the 
quality of the connections between 7 and j, and the values 
cj measure the level of trust in the information released by 
service j. Then a service b(r) may be preferred because (i) 
the quality of the connection is good, and at the same time 
(ii) the level of trust in b(r) is high. In this case f = min does 
not seem adequate; it is not sensitive to any increment of the 
maximal argument, therefore it does not forces simultaneous 
improvement of the two values kj; and cj. A function more 
sensitive to both of its parameters seems more appropriate 
(e.g., one may adopt f = + or f = x). 

The third objective function, denoted by Q4, is appropri- 
ate for QoS-like criteria, too. Sometimes, in a compound 
service, the quality of the worst component service affects 
the quality of the entire service. For example, the overall 
privacy preservation degree of a compound service issuing 
a set of requests R, is determined by the minimal privacy 
preservation degree of the service components (i.e. the indi- 
vidual invocations b(r)). In this kind of scenario, the quality 
estimates f(k, p(r), Co(r)) are combined by taking their min- 
imum: 


Qr = min{ f (ky or), Co(ry) |r E€ R}. (4) 


The three objective functions Cy, Qo, and Q; induce three 
classes of SSP: 


SSPc: Given a SSP instance J, find a binding b for J that 
minimizes the cost function Cp. 


SSPo: Given a SSP instance J, find a binding b for I that 
maximizes the quality function Qb. 


SSP: Given a SSP instance J, find a binding b for I that 
maximizes the quality function Q;. 


The last two problems, Q, and Qj, are not much different 
from each other, as stated by the following result: 


THEOREM 3.1. For each SSP instance I, 


1. All the solutions of I under SSPg are also solutions of 
I under SSPo. 


2. Conversely, at least one solution of I under SSPQ is 
also a solution of I under SSPo. 


Intuitively, the reason is that SSP considers only the 
bottlenecks, while SSP og tries to improve all services. 


4. COMPLEXITY OF AND ALGORITHMS 
FOR SSP. 


We prove that SSPc is NP-hard by reduction from the 
Uncapacitated Facility Location Problem (UFLP), which is 
defined as follows. We are given a bipartite graph (F,C) 
with set of n facilities F and m cities C. Let fj represent 
the cost of opening a facility at location j in F, and cj; 
represent the cost of serving city i from an open facility j. 
The goal is to find a subset J of F along with an assignment 
function ® : C — I to assign the cities such that the total 
cost is minimized. 

There is a great variety of types of facility location prob- 
lems depending on the features of the components that con- 
tribute in the model definition. Some basic classes of facility 
location problems are listed below. 
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1. If there are given upper bounds on the number of cities 
a facility can serve, then the corresponding problem is 
classified as a Capacitated Facility Location Problem. 


2. If some data are given by a probability distribution, 
the problem is considered to be stochastic; otherwise 
it is referred to as a deterministic one. 


3. If the decision process is concerned not only with the 
location of the facilities to be open but also with “the 
moment” of their opening, then the corresponding prob- 
lem is called a Dynamic Facility Location Problem; 
otherwise it is called a Static Facility Location Prob- 
lem. 


The uncapacitated facility location problem we need in 
this paper is static and deterministic and admits the follow- 
ing integer programming formulation. 


(UFLP) min X cei iy +Y fys 
i=1 j=1 jsi 
s.t. 
DEZ =1, l<i<m (a) 
j=l 
Yj — zij 20, 1<i<m,1<jg<n_ (b) 


Tij, yj E {0,1} 1<i<m, 1<jK<n, 


where constraints (a) impose that each city is assigned to at 
least one facility, while constraints (b) restrict assignments 
to open facilities only. 

Despite their simple formulation, most location problems 
are very difficult to solve. Except for some special cases, 
their decision version (For a given K, is there a solution 
with cost < K?) have been shown to be NP-hard by reduc- 
tion from the Vertex Cover Problem (membership in NP is 
straightforward). An extensive survey of location problems, 
their complexities and applications can be found in the book 
edited by Mirchandani and Francis [18]. 

By setting R = C, O = F, cj = fj, and kij = cij (1 < 
i <m, 1< j <n), UFLP can be reduced to SSPc, and 
viceversa. Then, we can prove the following result: 


PROPOSITION 4.1. Deciding whether the optimal 
cost of a given instance of SSPe is less than or equals a 
given rational K is NP-complete. 


With this result, we can express the optimality check 
as the conjunction of an NP-complete test and a co-NP- 
complete test, so we get the following theorem. 


THEOREM 4.2. Deciding whether the optimal cost of a 
given instance of SSPe equals a given rational K is DP- 
complete. 


The optimal cost can be computed through a binary search 
of K, based (by the above proposition) on an oracle for NP. 
This procedure provides an upper bound to the complexity 
of the optimization problem. 


the 
pNP, 


THEOREM 4.3. Computing cost 


given instance of SSPe is in F 


optimal 


of a 


Note that by Theorem 4.2, the optimization problem is 
harder than NP, unless the polynomial hierarchy collapses. 
The source of complexity lies in the one-time costs associ- 
ated to services, as shown in the next two subsections. 


4.1 Exact and approximated algorithms 


The algorithms described in this section accept 
slightly modified instances of SSPc, where the function k 
is extended to all of R x O by setting ki; = +00 for all 
(i,j) € (R x 0) \ M. 

Algorithm 1 solves exactly the problem in the obvious 
way, by exhaustively trying all possible bindings. The only 
optimization consists in aborting a tentative binding con- 
struction whenever the value of the current partial binding 
exceeds the best cost found so far. 

Nevertheless, the intractable nature of the problem makes 
approximate solutions the natural choice for dealing with 
large instances. The first constant factor approximation al- 
gorithm for facility location problems due to Shmoys et al. 
appeared in the literature in 1997 [23]. In 1999 Guha and 
Khuller [8] proved that it is impossible to get an approxi- 
mation guarantee of 1.463 unless NPCDTIME[n0 C8108 n), 
Since then, several scientific papers have been published 
along this line of research [10, 9, 14, 25]. 

In our exploration of the approximate solutions we have 
implemented the best known approximation algorithm (Al- 
gorithm 2) proposed by Mahdian et al. [15]. This algorithm 
ensures that the ratio between the cost of the returned solu- 
tion and the optimal cost is bounded by 1.52. Algorithm 2 
combines the greedy algorithm proposed by Jain et al. [9] 
(Algorithm 3) with the idea of cost scaling and can be im- 
plemented in quasi-linear time, as showed by the authors 
using a result of Thorup [27]. 

We have also investigated a simple heuristic approach re- 
quiring time O(mn?). Algorithm 4 consists of two phases: 
a greedy adaptive construction phase (line 3, calling Algo- 
rithm 5) and a local search phase (lines 4-21). These algo- 
rithms use the sets of requests Rs served by each service s, 
formally defined by 


Rs = {r E€ R | b(r) = s}. 


Algorithm 4 and Algorithm 5 return inverse bindings, repre- 
sented by pairs (y, {Rs}sco) where (i) y is a boolean vector 
such that ys = 1 iff offer s is used in the binding, and (ii) the 
family {Rs }sco defines for each offer s the requests satisfied 
by s. 

Starting from an empty solution, the first phase (Algo- 
rithm 5) iteratively constructs a feasible solution in a greedy 
and adaptive fashion with a greedy function defined on both 
matching and service costs. At each iteration, a new match- 
ing is determined between an unmatched request and the 
most convenient offer. In future iterations, the cost of this 
offer will not be considered again while evaluating the greedy 
choice (a greedy adaptive schema). The running time of Al- 
gorithm 5 that performs this phase is O(mn). 

Starting from the feasible binding found by the construc- 
tion phase, the local search phase tries (in time O(mn)) to 
find a better binding by slightly perturbing it. In particular, 
for each invoked offer s € {1,2,...,n} the algorithm looks 
for an alternative and more convenient offer l Æ s that can 
serve the requests currently matched to s (l may have been 
already associated to other requests, but not necessarily). If 
such a service l is found, then all the requests served by s 
are redirected to l. 

This strategy is expected to work especially well in the 
presence of multi-function services that make per-use dis- 
counts to users that register to many of the service’s options. 
In case of heavy use of these functionalities, the algorithm 


533 


Algorithm 1 
EXHAUSTIVESEARCH (UR, CO, PC, BC, c, k, b) 


1: Inputs: UR:unmatched requests, CO:called offers, 
PC:partial cost, BC:best cost, c:vector of costs asso- 
ciated to services, k:matrix of costs associated to invo- 
cations. 

2: Outputs: best cost and an optimal binding b. 

3: begin 

4: if PC > BC then 

5: return BC {abort search; keep current best cost} 

6: else if UR = Ý then {we found a better complete solu- 
tion, as PC < BC} 

7: save the current binding b; 

8: return PC {new best cost} 

9: else 


10: choose r € UR; 

11: for all s such that krs < +oo do 

12: b(r) := s; {bind r to s} 

13: if s € CO then 

14: PC, := PC + krs 

15: else 

16: PCs := PO + krs + Cs 

I: BC := EXHAUSTIVESEARCH (UR \ {r}, CO U 
{s}, PCs, BC, C, k) 

18: return BC 

19: end 


is likely to find that the service is more convenient even if 
its one-time cost is higher than those of the competing ser- 
vices. An experimental evaluation of Algorithms 2 and 4 is 
discussed in Section 4.4. 


4.2 A polynomially solvable subclass 


Let us suppose that the one-time costs associated to ser- 
vices are null, that is: 


Vs € O, Cs = 0. (5) 


(the costs krs associated to service invocations may be greater 
than zero). This special case of SSP is equivalent to a spe- 
cial transportation problem and can be polynomially solved 
by following a greedy approach with greedy function given 
by k: Rx O — Q. The optimal solution activates all ser- 
vices and simply matches a service request with the cheap- 
est service. It is easy to show that the optimal cost can be 
computed through GREEDYADAPT (Algorithm 5) with null 
input cost vector c: 


THEOREM 4.4. The binding corresponding to the values 
y,{Rs}seo returned by Algorithm 5 is optimal if c is null. 


Note that in this case GREEDYADAPT is a pure greedy al- 
gorithm running in O(mn) time. The next corollary follows 
immediately. 


COROLLARY 4.5. If (5) holds, then SSPc can be solved in 
time O(mn). 


4.3 The source of complexity of SSP. 


In the light of Section 4.2, it is interesting to investigate 
the complexity of SSPc when the invocation costs are null, 
that is: 


Y (r,s) E€ M, krs = 0 (6) 


Algorithm 2 


1.52APPROX (m,n, c, k, 6) 


1: Outputs: for 6 = 1.504, 1.52-approximate binding - 
represented by y and {Rs}seo - and its cost C. 


: begin 


: for alls = 1 to n do 


c(s) := c(s) x ô 


:= JAIN (m,n, c, k) 


: for all s = 1 to n do 


2 
3 
4: 
5: (y, {Rs}sco, C, d) 
6 
7 
8 


ce(s) := ols) 

: bool:=true 
9: while (bool) do 
10: max:= 0 
Ti: for s = 1 to n s.t. Ys = 0 do 
12: Ĉĉ :=0, C :=0 
13: Q:=0 
14: for r = 1 to m s.t. krs < +20 do 
15: Q:=QU{r} 
16: C= C+ keg 
ive C = C T kre(r) 
18: if (max< (C — C — cs) /cs) then 
19: max:= (Č — C — cs) /¢s 
20: ViI=s 
21: C0 
22 Qo = Q 
23: if (max> 0) then 
24 Yo = 1, dy = Cy, Ro = Qo 
25 for r € R, do 
26 j := b(r) 
27 dj := dj — kr; 
28 Rj := R; \ {r} 
29 if (R; = 0) then 
30: yj :=0 
31: b(r) =v 
32: else 
33: bool:=false 
34: return C 
35: end 


Algorithm 3 
JAIN (m,n, c, k) 


13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
2T: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
45: 
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: Outputs: 1.61-approximate binding - represented by y 


and {Rs}sco - and its cost C. 


: begin 


C :=0 
for all r = 1 to m do 

b(r) := 0, budget (r) := 0 
for all s = 1 to n do 

Ys := 0 


: while (there exists r € {1,2,..., m} s.t. b(r) = 0) do 


for all r = 1 to m s.t. b(r) = 0 do 
budget(r) :=budget(r) + 1 
for all s = 1 to n do 
if ys = 0 then 
totoffer:= 0, i := 0 
for all r = 1 to m do 
if (b(r) = 0) then 
if (budget(r) — krs > 0) then 
totoffer:=totoffer+budget(r) — krs 
i:=i +1, L(i):=r 
else 
if (Krb(r) — krs > 0) then 
totoffer:=totoffer+k,o(r) — krs 
i:=i +1, L(i):=r 
if (totoffer> cs) then 
Ys := 1 
for k = 1 to i do 


if (R; = 0) then 
y; =0 
b(v) := s, Rs := Rs U {v} 
ds = ds + kus 
else 
for r = 1 to m do 
if (b(r) = 0) then 
if (budget(r) = krs) then 
Rs := Rs U {r} 
b(r) := s, ds := ds + krs 
for all s = 1 to n do 
if (ys = 1) then 
C := C + cs + ds 
return (y, {Rs}sco, C, d) 
end 


Algorithm 4 
GREEDY ADAPTHEUR (m,n, c, k) 


1: Outputs: suboptimal binding — represented by y and 
{Rs }sco — and its cost C. 

2: begin 

3: (y, {Rs}sco, C, d) := GREEDYADAPT (m,n, c, k) 
{Local search phase} 


4: for all s = 1 ton do 
5 if ys = 1 then 
6 improved := false 
T: i= l 
8: while (not improved and l < n) do 
9: if 1A s then 
10: q:= 5 kri 
rERs 
11: gain := ( cs + ds )— [a(l y) +q] 
12: if gain> 0 then 
13: Rı := Ri U Rs 
14: dı := di +q 
15: Rs := 
16: ds := 0 
17: Ys := 0 
18: yı := 1 
19: improved := true 
20: C = C-gain 
21: l:=1+1 
22: {end while} 
23: return (y, {Rs}seo,C) 
24: end 


Algorithm 5 
GREEDYADAPT (m,n, c, k) 


= 


Outputs: suboptimal binding — represented by y and 
{Rs }sco —, its cost C and the costs d (see below). 
begin 
for all s = 1 to n do 

{Init structures} 

Rs := 

Ys := 0 {i.e. s not used} 

ds := 0 {total cost of all calls to s} 
C :=0 
for all r = 1 to m do 

min := +00 

for all s = 1 to n do 

if min > cs(1 — ys) + krs then 
min := cs(1 — ys) + krs 
best := s 

C := C + min 

Roest = Roest U {r} 
17: dbest = dbest + kr best 
18: Yrest := 1 
19: return (y, {Rs}sco, C, d) 
20: end 


10: 
11: 
12: 
13: 
14: 
15: 
16: 
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(the costs cj may be nonzero.) In this case, SSPc remains 
difficult. Its computational complexity remains high even if 
(6) holds and the costs c; are all identical (but nonzero), 
that is, 


Y {s,t} CO, cs =a #0. (7) 


To prove this, we note that the hitting set problem [7] 
can be reduced to the decision version? of SSPc satisfying 
(6) and (7). The hitting set problem can be formulated as 
follows: 


Given a finite set S, a collection of sets S; C S 
(1 <i < z), and a positive K, decide whether 
there exists S’ C S such that for all i = 1...2z, 
S'N Si £0 and |S'| < K. 


The hitting set problem is known to be NP-complete. The 
hitting set problem can be reduced to the decision version 
of SSPe under restrictions (6) and (7) by defining: R = 
{1,...,z}, O = S (we may assume w.l.o.g. that S is a finite 
initial segment of N), M = {(i, j) | j € Si}, and c = {1}”. 

Then, by analogy with the cost estimates for the general 
case, we can prove that for the class of SSPc instances sat- 
isfying (6) and (7): 


e Checking whether the optimal cost equals a given ra- 
tional K is DP-complete. 

e Computing the optimal cost is in FPN?. 

From this result and the results of the previous section, we 
conclude that the costs cj associated to services are entirely 
responsible for the high computational complexity of SSPc. 
This holds even if the service offers all have the same cost. 
Intuitively, in this case, it is hard to choose among services 
with the same activation cost that compete by offering dif- 
ferent, partially overlapping sets of free functionalities. 


4.4 Experimental results 


We performed some preliminary experiments with Algo- 
rithms 1, 2, and 4, using a C implementation running on a 
Pentium 4, 2.4GHz, 512Mb. 

To compare the algorithms, we applied them to a set of 
300 randomly generated instances, according to the follow- 
ing criteria. Recall that m is the number of requests and n 
is the number of offers. We have considered instances with 
5 < m < 100 and 100 < n < 10000 (assuming that the 
set of offers in practice will be significantly larger than the 
set of requests in a workflow). We fixed the range of the 
invocation costs k to [0,100] and the range of the one-time 
costs c to [1,p- 100] for p = 0.1, 1, 10, in order to check the 
influence of the relative weight of k and c. For each triple 
(m,n,p) 10 instances have been randomly generated. The 
runs longer than 1 hour have been killed. 

Algorithm 1 (that computes an optimal solution) exhib- 
ited a satisfactory performance for all the instances with 
m < 10 and n < 100. The maximal elapsed time was 0.35 
seconds. 

The performance started to decrease for (n, m) = (15, 150). 


e For (n,m) 
5:31”. 


(15,150) the maximal elapsed time was 


3The decision version of SSPe is: Given an SSP instance 
and a cost K, decide whether there is a solution with cost 
<K. 


e For (n,m) = (20,200), 20% of the runs have been 
killed and the maximal elapsed time of the other runs 
was 21':01”. 


e For (n,m) = (20,200), 73% of the runs have been 
killed and the maximal elapsed time of non-killed runs 
was over 59 minutes. 


Algorithms 2 and 4 are much faster, of course. The former 
has been killed only once (m = 100, n = 10000), the latter 
has never been killed. The average time of Algorithm 2 was 
2.5 minutes. Algorithm 4 seems to be faster (average less 
than 30 seconds), but more extensive experimentations are 
needed to confirm and explain this observation. 

Some of the average execution times of Algorithm 2 are 
reported in Figure 1. The figure illustrates both how ex- 
ecution time grows with the size of the problem instance, 
and the influence of one-time costs on performance. In par- 
ticular, it appears that as one-time costs become negligible, 
Algorithm 2 becomes faster. When the upper bound for 
one-time costs used by the random generator is one tenth of 
the upper bound for per-use costs, the average time drops 
down to 65.19 seconds. 

We measured the quality of the approximate solutions re- 
turned by Algorithm 4 by evaluating the relative error of 
each solution; the relative error is Ax where A is the ap- 
proximate cost and C is the optimal cost (we computed the 
error only for those instances whose optimal cost was avail- 
able, i.e. the exact algorithm was not killed). The average 
of the errors is around 70%, which is not bad for a naive 
heuristics. Also in this case, we need more experiments to 
validate this observation. 


5. COMPLEXITY OF AND ALGORITHMS 
FOR SSP. AND SSP, 


Unlike SSPc, SSPg and SSP are always easy. These two 
problems can be solved almost in the same way. Algorithm 6 
solves the version of SSPg with objective functions Qp. 


Algorithm 6 
GREEDYADAPT-Q (m, n,c, k) 


Outputs: an optimal binding b and its quality level L. 
begin 
L := +00 
for all r = 1 to m do 
mazlev, := —00 


for all s = 1 to n do 
if mazlev, < f(krs,cs) then 

mazlevr = f (krs, Cs) 
best := s 

10: L := L + mazlev, 

11: b(r) := best 

12: return (b, L) 

13: end 


To solve the version based on Q}, only one change to 
Algorithm 6 is required: replace line 10 with 


10: L := min{maclev,, L}. 


It is not hard to prove the correctness of the two versions 
of Algorithm 6 w.r.t. SSPo and SSP; from this property 


and a straightforward analysis of Algorithm 6, we conclude 
that: 


THEOREM 5.1. SSPo and SSPo can be solved in time 
O(mn). 


This approach can be easily extended to any objective 
function similar to Q and Q’, based on polynomially com- 
putable, monotonic combination functions besides ` and 
min. The details will be given in an extended version of the 
paper. 


6. CONCLUSIONS 


Summarizing, we formalized three kinds of optimal ser- 
vice selection problems—based on cost minimization and on 
two different quality maximization criteria—and we proved 
that the cost minimization problem, SSPc, is generally hard, 
while the two quality maximization problems, SSPo and 
SSP%, can be solved in polynomial time. In particular, SSPc 
is in FPN? and harder than NP(unless the polynomial hi- 
erarchy collapses). 

We proved that the reason of the high computational 
complexity of SSPc lies in the one-time costs associated to 
service offers (such as initialization and registration costs). 
When these costs are all null, SSPc is solvable in polyno- 
mial time (on the contrary, in the absence of per-use costs 
the problem does not become easier). 

We designed and implemented algorithms for computing 
exact solutions for all these versions of SSP. The exact al- 
gorithm for SSPc¢ (Algorithm 1) has been evaluated exper- 
imentally. According to the current results, instances with 
up to 10 requests and 100 offers can be nicely handled by 
this algorithm; for larger instances, the performance quickly 
decreases, making the algorithm inapplicable. 

We have also designed and implemented suboptimal so- 
lutions and evaluated them experimentally. Currently, it 
seems that the algorithm with a guaranteed 0.52 bound on 
relative error is too slow for real-time service selection over 
large workflows and offer sets. The heuristic algorithm (Al- 
gorithm 4) seems to be faster, but it has no guarantees on 
the quality of the solution. 

We are planning to carry out more experiments to validate 
and refine these preliminary observations. Moreover, we are 
trying to sharpen the complexity bounds (we do not yet 
know whether SSPc is complete for FPN?). 

Finally, we are generalizing the framework presented in 
this paper by considering optimization problems that involve 
simultaneous cost minimization and quality maximization, 
as well as multidimensional measures that induce partially 
ordered measures of nonfunctional properties (although in 
several cases multiple criteria can be reduced to single, to- 
tally ordered numeric measures [5]). Another direction for 
generalization concerns time-dependent costs and preferences. 
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Figure 1: Performance of Algorithm 2 
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